160 research outputs found

    Load shedding in network monitoring applications

    Get PDF
    Monitoring and mining real-time network data streams are crucial operations for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple and arbitrary monitoring applications. Such systems must inevitably cope with the effects of continuous overload situations due to the large volumes, high data rates and bursty nature of the network traffic. These overload situations can severely compromise the accuracy and effectiveness of monitoring systems, when their results are most valuable to network operators. In this thesis, we propose a technique called load shedding as an effective and low-cost alternative to over-provisioning in network monitoring systems. It allows these systems to handle efficiently overload situations in the presence of multiple, arbitrary and competing monitoring applications. We present the design and evaluation of a predictive load shedding scheme that can shed excess load in front of extreme traffic conditions and maintain the accuracy of the monitoring applications within bounds defined by end users, while assuring a fair allocation of computing resources to non-cooperative applications. The main novelty of our scheme is that it considers monitoring applications as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Without any explicit knowledge of the application internals, the proposed scheme extracts a set of features from the traffic streams to build an on-line prediction model of the resource requirements of each monitoring application, which is used to anticipate overload situations and control the overall resource usage by sampling the input packet streams. This way, the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. Since not all monitoring applications are robust against sampling, we then extend our load shedding scheme to support custom load shedding methods defined by end users, in order to provide a generic solution for arbitrary monitoring applications. Our scheme allows the monitoring system to safely delegate the task of shedding excess load to the applications and still guarantee fairness of service with non-cooperative users. We implemented our load shedding scheme in an existing network monitoring system and deployed it in a research ISP network. We present experimental evidence of the performance and robustness of our system with several concurrent monitoring applications during long-lived executions and using real-world traffic traces.Postprint (published version

    Performance of direct-oversampling correlator-type receivers in chaos-based DS-CDMA systems over frequency non-selective fading channels

    Get PDF
    In this paper, we present a study on the performance of direct-oversampling correlator-type receivers in chaos-based direct-sequence code division multiple access systems over frequency non-selective fading channels. At the input, the received signal is sampled at a sampling rate higher than the chip rate. This oversampling step is used to precisely determine the delayed-signal components from multipath fading channels, which can be combined together by a correlator for the sake of increasing the SNR at its output. The main advantage of using direct-oversampling correlator-type receivers is not only their low energy consumption due to their simple structure, but also their ability to exploit the non-selective fading characteristic of multipath channels to improve the overall system performance in scenarios with limited data speeds and low energy requirements, such as low-rate wireless personal area networks. Mathematical models in discrete-time domain for the conventional transmitting side with multiple access operation, the generalized non-selective Rayleigh fading channel, and the proposed receiver are provided and described. A rough theoretical bit-error-rate (BER) expression is first derived by means of Gaussian approximation. We then define the main component in the expression and build its probability mass function through numerical computation. The final BER estimation is carried out by integrating the rough expression over possible discrete values of the PFM. In order to validate our findings, PC simulation is performed and simulated performance is compared with the corresponding estimated one. Obtained results show that the system performance get better with the increment of the number of paths in the channel.Peer ReviewedPostprint (author's final draft

    Flow monitoring in software-defined networks: finding the accuracy/performance tradeoffs

    Get PDF
    In OpenFlow-based Software-Defined Networks, obtaining flow-level measurements, similar to those provided by NetFlow/IPFIX, is challenging as it requires to install an entry per flow in the flow tables. This approach does not scale well as the number of entries in the flow tables is limited and small. Moreover, labeling the flows with the application that generates the traffic would greatly enrich these reports, as it would provide very valuable information for network performance and security among others. In this paper, we present a scalable flow monitoring solution fully compatible with current off-the-shelf OpenFlow switches. Measurements are maintained in the switches and are asynchronously sent to a SDN controller. Additionally, flows are classified using a combination of DPI and Machine Learning (ML) techniques with special focus on the identification of web and encrypted traffic. For the sake of scalability, we designed two different traffic sampling methods depending on the OpenFlow features available in the switches. We implemented our monitoring solution within OpenDaylight and evaluated it in a testbed with Open vSwitch, using also a number of DPI and ML tools to find the best tradeoff between accuracy and performance. Our experimental results using real-world traffic show that the measurement and classification systems are accurate and the cost to deploy them is significantly reduced.Peer ReviewedPostprint (author's final draft

    TrackSign-labeled web tracking dataset

    Get PDF
    Recent studies show that more than 95% of the websites available on the Internet contain at least one of the so-called web tracking systems. These systems are specialized in identifying their users by means of a plethora of different methods. Some of them (e.g., cookies) are very well known by most Internet users. However, the percentage of websites including more "obscure" and privacy-threatening systems, such as fingerprinting methods identifying a user's computer, is constantly increasing. Detecting those methods on today's Internet is very difficult, as almost any website modifies its content dynamically and minimizes its code in order to speed up loading times. This minimization and dynamicity render the website code unreadable by humans. Thus, the research community is constantly looking for new ways to discover unknown web tracking systems running under the hood. In this paper, we present a new dataset containing tracking information for more than 76 million URLs and 45 million online resources, extracted from 1.5 million popular websites. The tracking labeling process was done using a state-of-the-art discovery web tracking algorithm called TrackSign. The dataset also contains information about online security and the relation between the domains, the loaded URLs, and the online resource behind each URL. This information can be useful for different kinds of experiments, such as locating privacy-threatening resources, identifying security threats, or determining characteristics of the URL network graph.This publication is part of the Spanish I+D+i project TRAINER-A (ref.∼PID2020-118011GB-C21), funded by MCIN/ AEI/10.13039/501100011033. This work is also supported by the Catalan Institution for Research and Advanced Studies (ICREA Academia).Peer ReviewedPostprint (published version

    Towards accurate detection of obfuscated web tracking

    Get PDF
    Web tracking is currently recognized as one of the most important privacy threats on the Internet. Over the last years, many methodologies have been developed to uncover web trackers. Most of them are based on static code analysis and the use of predefined blacklists. However, our main hypothesis is that web tracking has started to use obfuscated programming, a transformation of code that renders previous detection methodologies ineffective and easy to evade. In this paper, we propose a new methodology based on dynamic code analysis that monitors the actual JavaScript calls made by the browser and compares them to the original source code of the website in order to detect obfuscated tracking. The main advantage of this approach is that detection cannot be evaded by code obfuscation. We applied this methodology to detect the use of canvas-font tracking and canvas fingerprinting on the top-10K most visited websites according to Alexa's ranking. Canvas-based tracking is a fingerprinting method based on JavaScript that uses the HTML5 canvas element to uniquely identify a user. Our results show that 10.44% of the top-10K websites use canvas-based tracking (canvas-font and canvas fingerprinting), while obfuscation was used in 2.25% of them. These results confirm our initial hypothesis that obfuscated programming in web tracking is already in use. Finally, we argue that canvas-based tracking can be more present in secondary pages than in the home page of websites.Peer ReviewedPostprint (author's final draft

    A novel approach to security enhancement of chaotic DSSS systems

    Get PDF
    In this paper, we propose a novel approach to the enhancement of physical layer security for chaotic direct-sequence spread-spectrum (DSSS) communication systems. The main idea behind our proposal is to vary the symbol period according to the behavior of the chaotic spreading sequence. As a result, the symbol period and the spreading sequence vary chaotically at the same time. This simultaneous variation aims at protecting DSSS-based communication systems from the blind estimation attacks in the detection of the symbol period. Discrete-time models for spreading and despreading schemes are presented and analyzed. Multiple access performance of the proposed technique in the presence of additional white Gaussian noise (AWGN) is determined by computer simulations. The increase in security at the physical layer is also evaluated by numerical results. Obtained results show that our proposed technique can protect the system against attacks based on the detection of the symbol period, even if the intruder has full information on the used chaotic sequence.Peer ReviewedPostprint (author's final draft

    Web Tracking: Mechanisms, Implications, and Defenses

    Get PDF
    This articles surveys the existing literature on the methods currently used by web services to track the user online as well as their purposes, implications, and possible user's defenses. A significant majority of reviewed articles and web resources are from years 2012-2014. Privacy seems to be the Achilles' heel of today's web. Web services make continuous efforts to obtain as much information as they can about the things we search, the sites we visit, the people with who we contact, and the products we buy. Tracking is usually performed for commercial purposes. We present 5 main groups of methods used for user tracking, which are based on sessions, client storage, client cache, fingerprinting, or yet other approaches. A special focus is placed on mechanisms that use web caches, operational caches, and fingerprinting, as they are usually very rich in terms of using various creative methodologies. We also show how the users can be identified on the web and associated with their real names, e-mail addresses, phone numbers, or even street addresses. We show why tracking is being used and its possible implications for the users (price discrimination, assessing financial credibility, determining insurance coverage, government surveillance, and identity theft). For each of the tracking methods, we present possible defenses. Apart from describing the methods and tools used for keeping the personal data away from being tracked, we also present several tools that were used for research purposes - their main goal is to discover how and by which entity the users are being tracked on their desktop computers or smartphones, provide this information to the users, and visualize it in an accessible and easy to follow way. Finally, we present the currently proposed future approaches to track the user and show that they can potentially pose significant threats to the users' privacy.Comment: 29 pages, 212 reference

    Comparison of Deep Packet Inspection (DPI) Tools for Traffic Classification

    Get PDF

    Independent comparison of popular DPI tools for traffic classification

    Get PDF
    Deep Packet Inspection (DPI) is the state-of-the-art technology for traffic classification. According to the conventional wisdom, DPI is the most accurate classification technique. Consequently, most popular products, either commercial or open-source, rely on some sort of DPI for traffic classification. However, the actual performance of DPI is still unclear to the research community, since the lack of public datasets prevent the comparison and reproducibility of their results. This paper presents a comprehensive comparison of 6 well-known DPI tools, which are commonly used in the traffic classification literature. Our study includes 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter, nDPI, and Libprotoident). We studied their performance in various scenarios (including packet and flow truncation) and at different classification levels (application protocol, application and web service). We carefully built a labeled dataset with more than 750 K flows, which contains traffic from popular applications. We used the Volunteer-Based System (VBS), developed at Aalborg University, to guarantee the correct labeling of the dataset. We released this dataset, including full packet payloads, to the research community. We believe this dataset could become a common benchmark for the comparison and validation of network traffic classifiers. Our results present PACE, a commercial tool, as the most accurate solution. Surprisingly, we find that some open-source tools, such as nDPI and Libprotoident, also achieve very high accuracy.Peer ReviewedPostprint (author’s final draft
    • …
    corecore